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Abstract 

In this paper, we propose low complexity algorithms based on Markov chain Monte Carlo (MCMC) 
technique for signal detection and channel estimation on the uplink in large scale multiuser multiple input 
multiple output (MIMO) systems with tens to hundreds of antennas at the base station (BS) and similar 
number of uplink users. A BS receiver that employs a randomized sampling method (which makes 
a probabilistic choice between Gibbs sampling and random sampling in each iteration) for detection 
and a Gibbs sampling based method for channel estimation is proposed. The algorithm proposed for 
detection alleviates the stalling problem encountered at high SNRs in conventional MCMC algorithm 
and achieves near-optimal performance in large systems. A novel ingredient in the detection algorithm 
that is responsible for achieving near-optimal performance at low complexities is the joint use of a 
randomized MCMC (R-MCMC) strategy coupled with a multiple restart strategy with an efficient restart 
criterion. Near-optimal detection performance is demonstrated for large number of BS antennas and 
users (e.g., 64, 128, 256 BS antennas/users). The proposed MCMC based channel estimation algorithm 
refines an initial estimate of the channel obtained during pilot phase through iterations with R-MCMC 
detection during data phase. In time division duplex (TDD) systems where channel reciprocity holds, these 
channel estimates can be used for multiuser MIMO precoding on the downlink. Further, we employ this 
receiver architecture in the frequency domain for receiving cyclic prefixed single carrier (CPSC) signals 
on frequency selective fading between users and the BS. The proposed receiver achieves performance 
that is near optimal and close to that with perfect channel knowledge. 

Keywords — Large-scale multiuser MIMO, Markov chain Monte Carlo technique, Gibbs sampling, stalling problem 
randomized sampling, multiple restarts, detection, channel estimation, cyclic prefixed single carrier system. 
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I. Introduction 

Capacity of multiple-input multiple-output (MIMO) wireless channels is known to increase linearly 
with the minimum of the number of transmit and receive antennas HI- HI- Large-scale MIMO systems 
with tens to hundreds of antennas have attracted much interest recently 0- iTTTl . The motivation to 
consider such large-scale MIMO systems is the potential to practically realize the theoretically predicted 
benefits of MIMO, in terms of very high spectral efficiencies/sum rates, increased reliability and power 
efficiency through the exploitation of large spatial dimensions. Use of large number of antennas is getting 
recognized to be a good approach to fulfill the increased throughput requirements in future wireless 
systems. Particularly, large multiuser MIMO wireless systems where the base station (BS) has tens to 
hundreds of antennas and the users have one or more antennas are widely being investigated |9], lfl2l - ifTTl . 
Communications on the uplink lfT3l . lfl6l as well as on the downlink |9), lfl4l . lfl31 in such large systems 
are of interest. Key issues in large multiuser MIMO systems on the downlink include low complexity 
precoding strategies and pilot contamination problem encountered in using non-orthogonal pilot sequences 
for channel estimation in multi-cell scenarios |[T4l . In large multiuser MIMO systems on the uplink, users 
with one or more antennas transmit simultaneously to the BS with large number of antennas, and their 
signals are separated at the BS using the spatial signatures of the users. Sophisticated signal processing is 
required at the BS receiver to extract the signal of each user from the aggregate received signal 0]]. Use 
of large number of BS antennas has been shown to improve the power efficiency of uplink transmissions 
in multiuser MIMO using linear receivers at the BS [16]. Linear receivers including matched filter (MF) 
and minimum mean square error (MMSE) receivers are shown to be attractive for very large number of 
BS antennas fT3ll . Our focus in this paper is on achieving near-optimal receiver performance at the BS 
in large multiuser MIMO systems on the uplink at low complexities. The receiver functions we consider 
include signal detection and channel estimation. The approach we adopt for both detection as well as 
channel estimation is Markov chain Monte Carlo (MCMC) approach. 

The uplink multiuser MIMO architecture can be viewed as a point-to-point MIMO system with co- 
located transmit antennas with adequate separation between them (so that there is no or negligible spatial 
correlation among them), and no cooperation among these transmit antennas H. Because of this, receiver 
algorithms for point-to-point MIMO systems are applicable for receiving uplink multiuser MIMO signals 
at the BS receiver. Recently, there has been encouraging progress in the development of low-complexity 
near-optimal MIMO receiver algorithms that can scale well for large dimensions (H, ifTOll . |[T8l - |[25l . 
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These algorithms are based on techniques from local neighborhood search including tabu search (SI, ifTOlh 
EH- EEL probabilistic data association ll22l . and message passing on graphical models including factor 
graphs and Markov random fields (23], (2H, 1251 . 

Another interesting class of low-complexity algorithms reported in the context of CDMA and MIMO 
detection is based on Markov chain Monte Carlo (MCMC) simulation techniques (26l- 021. MCMC 
techniques are computational techniques that make use of random numbers IT341 . MCMC methods 
have their roots in the Metropolis algorithm, an attempt by physicists to compute complex integrals 
by expressing them as expectations for some distribution and then estimating this expectation by drawing 
samples from that distribution 1351 . ll36l . In MCMC methods, statistical inferences are developed by 
simulating the underlying processes through Markov chains. By doing so, it becomes possible to reduce 
exponential detection complexity to linear/polynomial complexities. An issue with conventional MCMC 
based detection, however, is the stalling problem, due to which performance degrades at high SNRs 1271 . 
Stalling problem arises because transitions from some states to other states in a Markov chain can occur 
with very low probability 1271 . Our first contribution in this paper is that we propose an MCMC based 
detection algorithm that alleviates the stalling problem encountered in conventional MCMC and achieves 
near-optimal performance in large systems. A key idea that is instrumental in alleviating the stalling 
problem is a randomized sampling strategy that makes a probabilistic choice between Gibbs sampling 
and random sampling in each iteration. An efficient stopping criterion aids complexity reduction. This 
randomized sampling strategy, referred to as randomized MCMC (R-MCMC) strategy, is shown to achieve 
near-optimal performance in multiuser MIMO systems with 16 to 256 BS antennas and same number 
of uplink users for 4-QAM 1371 . However, we find that this randomized sampling strategy alone is not 
adequate to achieve near-optimal performance at low complexities for higher-order QAM (e.g., 16-QAM, 
64-QAM). We show that near-optimal performance is achieved in higher-order QAM also if a multiple 
restart strategy is performed in conjunction with R-MCMC. We refer to this strategy as 'R-MCMC 
with restarts' (R-MCMC-R) strategy. Here again, an efficient restart criterion aids complexity reduction. 
The joint use of both randomized sampling as well as multiple restart strategies is found to be crucial 
in achieving near-optimal performance for higher-order QAM in large systems. To our knowledge, the 
closeness to optimal performance achieved by the proposed R-MCMC-R algorithm for tens to hundreds 
of BS antennas/users with higher-order QAM has not been reported so far using other MCMC based 
algorithms in the literature. 
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Channel estimation at the BS is an important issue in large multiuser MIMO systems on the uplink. 
While channel estimation at the BS is needed for uplink signal detection, in TDD systems where channel 
reciprocity holds, the estimated channel can also be used for precoding purposes on the downlink avoiding 
the need for feeding back channel estimates from the users. Our second contribution in this paper is that 
we propose an MCMC based uplink channel estimation algorithm at the BS receiver. The algorithm 
employs Gibbs sampling to refine an initial estimate of the channel obtained during the pilot phase, 
through iterations with R-MCMC-R detection during the data phase. The algorithm is shown to yield 
good mean square error (MSE) and bit error rate (BER) performance in large multiuser MIMO systems 
(e.g., 128 BS antennas/users). BER Performance close to that with perfect channel knowledge is achieved. 
Finally, we employ the proposed MCMC based algorithms for equalization and channel estimation in 
frequency selective fading between the users and BS. Because of i) their advantage of avoiding the 
peak-to-average power ratio (PAPR) problem that is encountered in multicarrier systems and ii) their 
good performance, cyclic prefixed single carrier (CPSC) transmissions ll39l - l4li from the uplink users 
are considered. The proposed MCMC algorithms are shown to work well in receiving uplink CPSC 
transmissions in frequency selective fading as well. 

The rest of the paper is organized as follows. The uplink multiuser MIMO system model on frequency 
non-selective fading is presented in Section HH The proposed R-MCMC algorithm without and with 
multiple restarts and its performance/complexity results in frequency non-selective fading are presented 
in Section Hill Section [IV] presents the proposed MCMC based channel estimation algorithm and its 
performance. Section|V]presents the proposed receiver for frequency domain equalization of CPSC signals 
and channel estimation in frequency selective fading. Conclusions are presented in Section ED 

II. System Model 

Consider a large-scale multiuser MIMO system on the uplink consisting of a BS with N receive 
antennas and K homogeneous uplink users with one transmit antenna each, K < N (Fig. Q]). Extension 
to system model with non-homogeneous users where different users can have different number of transmit 
antennas is straightforward. N and K are in the range of tens to hundreds. All users transmit symbols 
from a modulation alphabet B. It is assumed that synchronization and sampling procedures have been 
carried out, and that the sampled base band signals are available at the BS receiver. Let xu G B denote 
the transmitted symbol from user k. Let x c = [x±,X2, ■ ■ ■ ,xk] T denote the vector comprising of the 
symbols transmitted simultaneously by all users in one channel use. Let H c £ C , given by H c = 
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[hi, I12, • • • , hx], denote the channel gain matrix, where h^ = [hik, h 2 k, • • • , h^^Y is the channel gain 
vector from user k to BS, and hjk denotes the channel gain from A;th user to jth receive antenna at the 
BS. Assuming rich scattering and adequate spatial separation between users and BS antenna elements, 
hjk,Vj are assumed to be independent Gaussian with zero mean and a\ variance such that a 2 = K. 
a\ models the imbalance in received powers from different users, and a\ = \ corresponds to the perfect 
power control scenario. This channel gain model amounts to assuming that the multipath fading between 
a user and BS is frequency non-selective. Frequency selective fading is considered in Section [V] Now, 
the received signal vector at the BS in a channel use, denoted by y c G C N , can be written as 

y c = H c x c + n c , (1) 

where n c is the noise vector whose entries are are modeled as i.i.d. CJ\f(0, a 2 ). We will work with the 
real-valued system model corresponding to dH), given by 

y r = H r x r + n r , (2) 

where x r G R 2K , H r G R 2Nx2K , y r G R 2N , n r G R 2N given by 
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Dropping the subscript r in © for notational simplicity, the real-valued system model is written as 

y = Hx + n. (3) 

For a QAM alphabet B, the elements of x will take values from the underlying PAM alphabet A, i.e., 
x G A 2A . The symbols from all the users are jointly detected at the BS. The maximum likelihood (ML) 
decision rule is given by 

"Xml = arg m i n lly — Hxll 2 = arg min /(x), (4) 

xgA 2A ' xeA 2A " 

where /(x) = x T H T Hx — 2y T Hx is the ML cost. While the ML detector in dU) is exponentially 
complex in K (which is prohibitive for large K), the MCMC based algorithms we propose in the next 
section have quadratic complexity in K and they achieve near-ML performance as well. 

III. Proposed Randomized-MCMC Algorithm for Detection 

The ML detection problem in (0]) can be solved by using MCMC simulations ll34l . We consider Gibbs 
sampler, which is an MCMC method used for sampling from distributions of multiple dimensions. In the 
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context of MIMO detection, the joint probability distribution of interest is given by 

p(xi,--' ,^2ftr|y,H) oc exp ( - - y Jj* X - y (5) 

We assume frequency non-selective fading and perfect knowledge of channel gain matrix H at the BS 
receiver in this section. We will relax the perfect channel knowledge assumption by proposing a MCMC 
based channel estimation algorithm in section HVl 

A. Conventional MCMC algorithm 

In conventional Gibbs sampling based detection, referred to as conventional MCMC algorithm, the 
algorithm starts with an initial symbol vector, denoted by x(* =0 ). In each iteration of the algorithm, an 
updated symbol vector is obtained by sampling from distributions as follows: 

(t+l) , , (t) (t) (t) X 

(t+l) / | (t+l) (t) (t) \ 

x\ ~ p{x 2 \x\ ',xy,--- ,xy K ), 

x^ ^ ~ p(rE3|2;^ , a; 2 ,2^4 , ■■■ , x^r)' 

4^ ~ pfxa,, ,x^i\). (6) 

The detected symbol vector in a given iteration is chosen to be that symbol vector which has the least ML 
cost in all the iterations up to that iteration. Another MCMC algorithm that uses a temperature parameter 
a and the following joint distribution is presented in |[33l : 

p(xi, ■ ■ ■ ,x 2 ftr|y,H) oc exp ( - - y a2 ^ 2 X - y (7) 

The algorithm uses a fixed value of a in all the iterations, with the property that after the Markov chain is 
mixed, the probability of encountering the optimal solution is only polynomially small (not exponentially 
small). This algorithm and the conventional MCMC algorithm (which is a special case of a = 1) face 
stalling problem at high SNRs; a problem in which the BER performance gets worse at high SNRs fl27l . 

B. Proposed R-MCMC algorithm 

It is noted that the stalling problem occurs due to MCMC iterations getting trapped in poor local 
solutions, beyond which the ML cost does not improve with increasing iterations for a long time. 
Motivated by this observation, we propose a simple, yet effective, randomization strategy to avoid such 
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traps. The key idea behind the proposed randomized MCMC (R-MCMC) approach is that, in each 
iteration, instead of updating x\ s as per the update rule in Q with probability 1 as in conventional 
MCMC, we update them as per (O with probability (1— <&) and use a different update rule with probability 
qi = Jg. The different update rule is as follows. Generate | A| probability values from uniform distribution 

as 

p( x f) =j)~u[0,l], Vie A 

such that 2J p(x\ = j) = L and sample x\ from this generated pmf. 

i=i 

7 J Proposed stopping criterion: A suitable termination criterion is needed to stop the algorithm. A 
simple strategy is to terminate the algorithm after a fixed number of iterations. But a fixed value of 
number of iterations may not be appropriate for all scenarios. Fixing a large value for the number of 
iterations can yield good performance, but the complexity increases with the number of iterations. To 
address this issue, we develop a dynamic stopping criterion that yields good performance without unduly 
increasing the complexity. The criterion works as follows. A stalling is said to have occurred if the ML 
cost remains unchanged in two consecutive iterations. Once such a stalling is identified, the algorithm 
generates a positive integer <d s (referred to as the stalling limit), and the iterations are allowed to continue 
in stalling mode (i.e., without ML cost change) up to a maximum of Q s iterations from the occurrence of 
stalling. If a lower ML cost is encountered before @ s iterations, the algorithm proceeds with the newly 
found lower ML cost; else, the algorithm terminates. If termination does not happen through stalling 
limit as above, the algorithm terminates on completing a maximum number of iterations, MAX-ITER. 

The algorithm chooses the value of @ s depending on the quality of the stalled ML cost, as follows. 
A large value for S is preferred if the quality of the stalled ML cost is poor, because of the available 
potential for improvement from a poor stalled solution. On the other hand, if the stalled ML cost quality 
is already good, then a small value of Q s is preferred. The quality of a stalled solution is determined 
in terms of closeness of the stalled ML cost to a value obtained using the statistics (mean and variance) 
of the ML cost for the case when x is detected error-free. Note that when x is detected error-free, the 
corresponding ML cost is nothing but ||n|| 2 , which is Chi-squared distributed with 2N degrees of freedom 
with mean No 2 and variance No 4 . We define the quality metric to be the difference between the ML 
cost of the stalled solution and the mean of ||n|| 2 , scaled by the standard deviation, i.e., the quality metric 
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of vector x is defined as 



We refer to the metric in ([8]) as the standardized ML cost of solution vector x. A large value of 0(x) 
can be viewed as an indicator of increased closeness of x to ML solution. Therefore, from the previous 
discussion, it is desired to choose the stalling limit © s to be an increasing function of <^>(x). For this 
purpose, we choose an exponential function of the form 

e s (0(x)) = ciex P (0(x)). (9) 

Also, we allow a minimum number of iterations (c m j n ) following a stalling. Based on the above discussion, 
we adopt the following rule to compute the stalling count: 

6 s (x) = [max (cmin, ci exp 0(x)))] . (10) 

The constant c\ is chosen depending upon the QAM size; a larger c\ is chosen for larger QAM size. As we 
will see in the performance and complexity results, the proposed randomization in the update rule and the 
stopping criterion are quite effective in achieving low complexity as well as near-optimal performance. 
A complete listing of the proposed R-MCMC algorithm incorporating the randomized sampling and 
stopping criterion ideas is given in the next page. 

2) Performance and complexity of the R-MCMC algorithm: The simulated BER performance and 
complexity of the proposed R-MCMC algorithm in uplink multiuser MIMO systems with 4-QAM are 
shown in Figs. [2] to [6] The following R-MCMC parameters are used in the simulations: c m i n = 10, 
ci = 20, MAX-ITER = 16K. Figures [2] to [5] are for the case where there is no imbalance in the 
received powers of all users, i.e., a\ = dB M k. Perfect channel knowledge at the BS is assumed. The 
performance of R-MCMC in multiuser MIMO with K = N = 16 is shown in Fig. [2] The performance of 
the MCMC algorithm using the distribution in (O with temperature parameter values a = 1, 1.5, 2, 3 are 
also plotted. 16K iterations are used in the MCMC algorithm with temperature parameter. Sphere decoder 
performance is also shown for comparison. It is seen that the performance of MCMC with temperature 
parameter is very sensitive to the choice of the value of a. For example, for a = 1, 1.5, the BER is 
found to degrade at high SNRs due to stalling problem. For a = 2, the performance is better at high 
SNRs but worse at low SNRs. The proposed R-MCMC performs better than MCMC with temperature 
parameter (or almost the same) at all SNRs and a values shown. In fact, the performance of R-MCMC is 
almost the same as the sphere decoder performance. The R-MCMC complexity is, however, significantly 
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Algorithm 1 Proposed randomized-MCMC algorithm 
l: input: y, H, x' '; 

x(°) : initial vector G A 2A '; MAX-ITER: max. # iterations; 
2-.t = 0; z = x(°); S = {1, 2, • • • , 2K}; S : index set; 
3: P = /(x^ )); /(.) : ML cost function; 6 S (.) : stalling limit function; 
4: while t < MAX-ITER do 
5: for i = 1 to 2K do 

6: randomly choose an index k from the set S; 
7: if (i / k) then 

»• r (t+1) ~ 71 ('r lT (t+1) ••• t ( * +1) r (t) ••• r (t)s l 

9: else 

10: generate pmf p(xf +1) = j) ~ U[Q, 1], Vj G A 

11: sample from this pmf 

12: end if 

13: end for 

14: 7 = /( X ( i + 1 )); 

15: if (7 < p) then 

16: z = x(* +1 ); P = T , 
17: end if 

18: t = t+l; 

19: /# = /3; 

20: if 4* } == then 

21: calculate s (z); 

22: if @ s < t then 

23: if == 4*" es) then 

24: goto step 29 

25: end if 

26: end if 

27: end if 

28: end while 

29: output: z. z : output solution vector 

lower than the sphere decoding complexity. While sphere decoder gets exponentially complex in K at 
low SNRs, the R-MCMC complexity (in average number of real operations per bit) is only 0(K 2 ) as can 
be seen in Fig. [3] Because of this low complexity, the R-MCMC algorithm scales well for large-scale 
systems with large values of K and N. This is illustrated in Fig. [4] and [5] where performance plots 
for systems up to K = N = 128 and 256 are shown. While Fig. |4] shows the BER as a function of 
SNR, Fig. [5] shows the average received SNR required to achieve a target BER of 10~ 3 as a function 
of K = N. Since sphere decoder complexity is prohibitive for hundreds of dimensions, we have plotted 
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unfaded single-input single-output (SISO) AWGN performance as a lower bound on ML performance 
for comparison. It can be seen that R-MCMC achieves performance which is very close to SISO AWGN 
performance for large K = N, e.g., close to within 0.5 dB at 10~ 3 BER for K = N = 128 and 256. 
This illustrates the achievability of near-optimal performance using R-MCMC for large systems. Figure [6] 
shows the BER performance in multiuser MIMO systems with received power imbalance among different 
users. The imbalance is simulated by choosing different a\ for different users, with a\ being uniformly 
distributed between -3 dB to 3 dB. Performance in systems with K = N = 16 and 128 are plotted with 
and without power imbalance. It is seen that even with power imbalance R-MCMC achieves almost the 
same performance as that of sphere decoder for K = N = 16. 

C. Multi-restart R-MCMC algorithm for higher-order QAM 

Although the R-MCMC algorithm is very attractive in terms of both performance as well as complexity 
for 4-QAM, its performance for higher-order QAM is far from optimal. This is illustrated in Fig. |7J where 
R-MCMC is seen to achieve sphere decoder performance for 4-QAM, whereas for 16-QAM and 64-QAM 
it performs poorly compared to sphere decoder. This observation motivates the need for ways to improve 
R-MCMC performance in higher-order QAM. Interestingly, we found that use of multiple restarts^ coupled 
with R-MCMC is able to significantly improve performance and achieve near-ML performance in large 
systems with higher-order QAM. 

1 ) Effect of restarts in R-MCMC and conventional MCMC: In Figs. HJa) and (Mb), we compare the 

effect of multiple random restarts in R-MCMC and conventional MCMC algorithms for 4-QAM and 

16-QAM, respectively. For a given realization of x, H and n, we ran both algorithms for three different 

random initial vectors, and plotted the least ML cost up to nth iteration as a function of n. We show 

the results of this experiment for multiuser MIMO with K = N = 16 at 1 1 dB SNR for 4-QAM and 

18 dB SNR for 16-QAM (these SNRs give about 1CT 3 BER with sphere decoding for 4-QAM and 

16-QAM, respectively). The true ML vector cost (obtained through sphere decoder simulation for the 

same realization) is also plotted. It is seen that R-MCMC achieves much better least ML cost compared 

to conventional MCMC. This is because conventional MCMC gets locked up in some state (with very 

low state transition probability) for long time without any change in ML cost in subsequent iterations, 

'it is noted that multiple restarts, also referred to as running multiple parallel Gibbs samplers, have been tried with conventional 
and other variants of MCMC in 1271 . 1291 , 1301 . But the stalling problem is not fully removed and near-ML performance is not 
achieved. It turns out that restarts when coupled with R-MCMC is very effective in achieving near-ML performance. 
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whereas the randomized sampling strategy in R-MCMC is able to exit from such states quickly and give 
improved ML costs in subsequent iterations. This shows that R-MCMC is preferred over conventional 
MCMC. Even more interestingly, comparing the least ML costs of 4-QAM and 16-QAM (in Figs. Oa) 
and (b), respectively), we see that all the three random initializations could converge to almost true 
ML vector cost for 4-QAM within 100 iterations, whereas only initial vector 3 converges to near true 
ML cost for 16-QAM and initial vectors 1 and 2 do not. Since any random initialization works well 
with 4-QAM, R-MCMC is able to achieve near-ML performance without multiple restarts for 4-QAM. 
However, it is seen that 16-QAM performance is more sensitive to the initialization, which explains the 
poor performance of R-MCMC without restarts in higher-order QAM. MMSE vector can be used as an 
initial vector, but it is not a good initialization for all channel realizations. This points to the possibility 
of achieving good initializations through multiple restarts to improve the performance of R-MCMC in 
higher-order QAM. 

2) R-MCMC with multiple restarts: In R-MCMC with multiple restarts, we run the basic R-MCMC 
algorithm multiple times, each time with a different random initial vector, and choose that vector with the 
least ML cost at the end as the solution vector. Figure [9] shows the improvement in the BER performance 
of R-MCMC as the number of restarts (R) is increased in multiuser MIMO with K = N = 16 and 
16-QAM at SNR = 18 dB. 300 iterations are used in each restart. It can be observed that, though BER 
improves with increasing R, much gap still remains between sphere decoder performance and R-MCMC 
performance even with R = 10. A larger R could get the R-MCMC performance close to sphere decoder 
performance, but at the cost of increased complexity. While a small R results in poor performance, a 
large R results in high complexity. So, instead of arbitrarily fixing R, there is a need for a good restart 
criterion that can significantly enhance the performance without incurring much increase in complexity. 
We devise one such criterion below. 

3 ) Proposed restart criterion: At the end of each restart, we need to decide whether to terminate the 
algorithm or to go for another restart. To do that, we propose to use 

• the standardized ML costs (given by ([8])) of solution vectors, and 

• the number of repetitions of the solution vectors. 

Nearness of the ML costs obtained so far to the error-free ML cost in terms of its statistics can allow 
the algorithm to get near ML solution. Checking for repetitions can allow restricting the number of 
restarts, and hence the complexity. In particular, we define multiple thresholds that divide the range of 
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the distribution of ||n|| 2 (i.e., M + ) into multiple regions, and define one integer threshold for each of these 
regions for the purpose of comparison with number of repetitions. We use the minimum standardized 
ML cost obtained so far and its number of repetitions to decide the credibility of the solution. In Fig [TOj 
we plot histograms of the standardized ML cost of correct and incorrect solution vectors at the output of 
R-MCMC with restarts in multiuser MIMO with K = N = 8 and 4-/16-QAM. We judge the correctness 
of the obtained solution vector from R-MCMC output by running sphere decoder simulation for the same 
realizations. It can be observed in Fig. [10] that the incorrect standardized ML cost density does not stretch 
into negative values. Hence, if the obtained solution vector has negative standardized ML cost, then it 
can indeed be correct with high probability. But as the standardized ML cost increases in the positive 
domain, the reliability of that vector decreases and hence it would require more number of repetitions 
for it to be trusted as the final solution vector. It can also be observed from Fig. [10] that the incorrect 
density in case of 16-QAM is much more than that of 4-QAM for the same SNR. So it is desired that, 
for a standardized ML cost in the positive domain, the number of repetitions needed to declare as the 
final solution should increase with the QAM size. Accordingly, the number of repetitions needed for 
termination (P) is chosen as per the following expression: 

P = [max (0, c 2 <Kx))J + 1, (11) 

where x is the solution vector with minimum ML cost so far. Now, denoting R m ax to be the maximum 
number for restarts, the proposed R-MCMC with restarts algorithm (we refer to this as the R-MCMC-R 
algorithm) can be stated as follows. 

• Step 1: Choose an initial vector. 

. Step 2: Run the basic R-MCMC algorithm in Sec. IIII-BI 

• Step 3: Check if R m ax number of restarts have been completed. If yes, go to Step 5; else go to 
Step 4. 

• Step 4: For the solution vector with minimum ML cost obtained so far, find the required number 
of repetitions needed using (flTI) . Check if the number of repetitions of this solution vector so far 
is less than the required number of repetitions computed in Step 4. If yes, go to Step 1, else go to 
Step 5. 

• Step 5: Output the solution vector with the minimum ML cost so far as the final solution. 

4) Performance and complexity of the R-MCMC-R Algorithm: The BER performance and complexity 
of the R-MCMC-R algorithm are evaluated through simulations. The following parameters are used in 
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the simulations of R-MCMC and R-MCMC-R: c min = 10, c x = 101og 2 M (i.e., c 2 = 20,40,60 for 
4-/16-/64-QAM, respectively), MAX-ITER = 8KVM, R max = 50, and c 2 = 0.51og 2 M. In Fig. (ED 
we compare the BER performance of conventional MCMC, R-MCMC, R-MCMC-R and sphere decoder 
in multiuser MIMO with K = N = 16 and 16-QAM. In the first restart, MMSE solution vector is 
used as the initial vector. In the subsequent restarts, random initial vectors are used. For 64-QAM, the 
randomized sampling is applied only to the one-symbol away neighbors of the previous iteration index; 
this helps to reduce complexity in 64-QAM. From Fig. [TTJ it is seen that the performance of conventional 
MCMC, either without or with restarts, is quite poor. That is, using restarts in conventional MCMC is 
not of much help. This shows the persistence of the stalling problem. The performance of R-MCMC 
(without restarts) is better than conventional MCMC with and without restarts, but its performance still 
is far from sphere decoder performance. This shows that R-MCMC alone (without restarts) is inadequate 
the alleviate the stalling problem in higher-order QAM. However, the randomized sampling in R-MCMC 
when used along with restarts (i.e., R-MCMC-R) gives strikingly improved performance. In fact, the 
proposed R-MCMC-R algorithm achieves almost sphere decoder performance (close to within 0.4 dB at 
10~ 3 BER). This points to the important observations that application of any one of the two features, 
namely, randomized sampling and restarts, to the conventional MCMC algorithm is not adequate, and that 
simultaneous application of both these features is needed to alleviate the stalling problem and achieve 
near-ML performance in higher-order QAM. 

Figure [T2T a) shows that the R-MCMC-R algorithm is able to achieve almost sphere decoder per- 
formance for 4-/16-/64-QAM in multiuser MIMO with K = N = 16. Similar performance plots for 
4-/16-/64-QAM for K = N = 32 are shown in Fig. fT27b). where the performance of R-MCMC-R 
algorithm is seen to be quite close to unfaded SISO-AWGN performance, which is a lower bound on 
true ML performance. 

5) Performance/complexity comparison with other detectors: In Table-Q] we present a comparison of the 
BER performance and complexity of the proposed R-MCMC-R algorithm with those of other detectors in 
the literature. Comparisons are made for systems with K = N = 16,32 and 4-/16764-QAM. Detectors 
considered for comparison include: i) random-restart reactive tabu search (R3TS) algorithm reported 
recently in ETTl . which is a local neighborhood search based algorithm, and it) fixed-complexity sphere 
decoder (FSD) reported in ll38l . which is a sub-optimal variant of sphere decoder whose complexity is 
fixed regardless of the operating SNR. Table-Q] shows the complexity measured in average number of real 
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operations at a BER of 10~ 2 and the SNR required to achieve BER for the above three detection 
algorithms. It can be seen that both R-MCMC-R and R3TS perform better than FSD. Also, R-MCMC-R 
achieves the best performance at the lowest complexity compared to R3TS and FSD for K = N = 16 
with 16-QAM and 64-QAM. In 4-QAM and in K = N = 32, R-MCMC-R achieves same or slightly 
better performance than R3TS at some increased complexity compared to R3TS. 

IV. Proposed MCMC based channel estimation 

In the previous section, we assumed perfect channel knowledge at the BS receiver. In this section, 
we relax the perfect channel knowledge assumption and propose an MCMC based channel estimation 
algorithm. 

A. System model 

Consider the uplink multiuser MIMO system model in dH). As in Sec. [Til perfect synchronization among 
users' transmissions is assumed. But the assumption of perfect knowledge of the channel matrix at the 
BS is relaxed here. The channel matrix is estimated based on a pilot based channel estimation scheme. 
Transmission is carried out in frames, where each frame consists of several blocks as shown in Fig. [13] 
A slow fading channel (typical with no/low mobility users) is assumed, where the channel is assumed 
to be constant over one frame duration. Each frame consists of a pilot block (PB) for the purpose of 
initial channel estimation, followed by Q data blocks (DB). The pilot block consists of K channel uses in 
which a A'-length pilot symbol vector comprising of pilot symbols transmitted from K users (one pilot 
symbol per user) is received by N receive antennas at the BS. Each data block consists of K channel 
uses, where K number of if -length information symbol vectors (one data symbol from each user) are 
transmitted. Taking both pilot and data channel uses into account, the total number of channel uses per 
frame is (Q + 1)K. Data blocks are detected using the R-MCMC-R algorithm using an initial channel 
estimate. The detected data blocks are iteratively used to refine the channel estimates during data phase 
using the proposed MCMC based channel estimation algorithm. 

B. Initial channel estimate during pilot phase 

Let Xp = [xp(0), Xp(l), • • • ,Xp(K — 1)] denote the the pilot symbol vector transmitted from user k 
in K channel uses in a frame. Let X P = [(xp) T , (xp) T , • • • , (-x.p) T ] T denote the K x K pilot matrix 
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formed by the pilot symbol vectors transmitted by the users in the pilot phase. The received signal matrix 
at the BS, Y P , of size N x K is given by 

Y P = H C X P + N P , (12) 

where N P is the N x K noise matrix at the BS. We use the pilot sequence given by 

xp = [0 (fc _i) x i p 0( K -k)xl]- ( 13 ) 

We choose p = \JKE S , where E s is the average symbol energy. Using the scaled identity nature of x P , 
an initial channel estimate H c is obtained as 

H c = Y P /p. (14) 

C. Data detection using initial channel estimate 

Let = [xf (0), Xj(l), • • • , x^{K — 1)] denote the data symbol vector transmitted from user k in 
K channel uses during the ith data block in a frame. Let X.; = [(x*) T , (x^) T , • • • , (xf ) T ] T denote the 
K x K data matrix formed by the data symbol vectors transmitted by the users in the ith data block 
during data phase, i = 1, 2, ■ ■ ■ , Q. The received signal matrix at the BS in the ith data block, Yj of 
size N x K, is given by 

Yj = H c X.j + Nj, (15) 

where Nj is the N xK noise matrix at the BS during ith data block. We perform the detection on a vector 
by vector basis using the independence of data symbols transmitted by the users. Let yf' denote the ith 
column of Yj, t = 0, 2, • • • ,K-1. Denoting the ith column of Xj as xf } = [xj (t) , xf (t) , • • • , xf (t)] T , 
we can rewrite the system equation (ITZb as 

yf =H c x?)+n« (16) 

where nf' is the ith column of Nj. The initial channel estimate H c obtained from (fT4l is used to detect 
the transmitted data vectors using the R-MCMC-R algorithm presented in Sec. ITTT1 

From (fT2l and (IT4b . we observe that H c = H c + N P /p. This knowledge about imperfection of channel 
estimates is used to calculate the statistics of error-free ML cost required in the R-MCMC-R algorithm. 
In Sec. Um we have observed that in case of perfect channel knowledge, the error-free ML cost is nothing 
but ||n 2 ||. In case of imperfect channel knowledge at the receiver, at channel use t, 

||yf)-H c xff = || n f-N P xf/pf. 
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Each entry of the vector nf' — Npxf'/p has mean zero and variance 2a 2 . Using this knowledge at the 
receiver, we detect the transmitted data using R-MCMC-R algorithm and obtain Scf . Let the detected 
data matrix in data block i be denoted as Xj = , • • • , x^ -1 ']. 



D. Channel estimation using MCMC algorithm in data phase 

Let Y tot = [Y P Yi Y 2 • • • Y Q ], X tot = [X P X x X 2 • • • X Q ], and N tot = [N P Ni N 2 • • • N Q ] denote 
the matrices corresponding to one full frame. We can express Ytot as 

Y tot = H c X tot + Ntot. (IV) 

This system model corresponding to the full frame is converted into a real-valued system model as done 
in Sec. [n] That is, (fT71) can be written in the form 



Y = HX + N, 



where 

" K(Y to t) 
3(Y t0 t) 

Bex**) 

»(X to t) 

Equation ( fT8l ) can be written as 



X 



- 9 (Y to t) 
»(Ytot) 

- 9(Xtot) 
K(Xtot) 



H 



N 



»(H C ) 


- 9(H C ) 


_ 9(H C ) 


K(H C ) _ 


»(Ntot) 


- ^ (Ntot) 


9(N to t) 


3f?(N to t) 



Y T = X T H T + N T . 



Vectorizing the matrices Y T , H T , and N T , we define 

r = vec(Y T ), g = vec(H. T ), z = vec(N T ). 
With the above definitions, ( fT9l) can be written in vector form as 

r = I 2 jv <8> X T g + z. 



(18) 



(19) 



(20) 



Now, our goal is to estimate g knowing r, estimate of S and the statistics of z using an MCMC approach. 
The estimate of S is obtained as 



S = I 



2N 



X 1 
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K(X tot ) -3f(X fot ) 
where X = 

9f(Xtot) K(X tot ) 
algorithm is obtained as 



and X tot = [X P Xi X 2 • • • Xq]. The initial vector for the MCMC 
g(°) = vec(H T ), (21) 



H 



(22) 



where 

' 5i(H c ) - 9f(H c ) 
3?(H C ) »(H C ) 

i) Gibbs sampling based estimation: The vector g is of length AKN x 1. To estimate g, the algorithm 
starts with an initial estimate, takes samples from the conditional distribution of each coordinate in g, and 
updates the estimate. This is carried out for a certain number of iterations. At the end of the iterations, 
a weighted average of the previous and current estimates are given as the output. 

Let the ith coordinate in g be denoted by gi, and let g„j denote all elements in g other than the ith 
element. Let s q denote the qth column of S. The conditional probability distribution for the ith coordinate 
is given by 



V ( S, g_i ) oc p(gi).p r|^,S,g 



oc exp {-\gi\ ) exp 



I 1 " 2^,q=l,q^i 9q s q 9i s i 



~\9i\ - 

exp ' 



9iSi\\ 2 



(23) 
(24) 

(25) 
(26) 



where rW = r - T^ig^i 9<i%' rW = [r®, 0] T , and s, = [si, cr] T . The quantity \\r® - ^Si|| 2 in (f26 



is minimized for gi 



lis, 



-. Hence, we can write 



|f«-^|| 2 



=(0 



s (<) 



+ 5i 



(r®) T sA_ 



ff»f s. 



(27) 



Hence, 



p (#i|r, S,g_j oc exp 



0i 



11 = 



(28) 



This distribution is Gaussian with mean 
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and variance 

■2 



° l = 5KP- (30) 

Let M AX denote the number of MCMC iterations. In each MCMC iteration, for each coordinate, the 
probability distribution specified by its mean and variance has to be calculated to draw samples. Let the 
mean and variance in rth MCMC iteration and ith coordinate be denoted as fi 9i and cr^. , respectively, 
where r = 1, 2, • • • , MAX and i = 1, 2, ■ ■ ■ , AKN. We use g^in (1211) . which is the estimate from the 
pilot phase, as the initial estimate. In the rth MCMC iteration, we obtain g( r ) from g( r ~ 1 ) as follows: 
. Take g( r ) =g( r ~ 1 ). 

• Update the ith coordinate of g( r ) by sampling from J\f(lig} o^/^l for all i. Let g^ denote the 
updated ith coordinate of gW. 

(r) ( (? M -M M ) 2 \ 

• Compute weights a\ = exp I — % 2 ^ 2 { l] I for all i. This gives more weight to samples closer 
to the mean. 

After MAX iterations, we compute the final estimate of the ith coordinate, denoted by g*, to be the 
following weighted sum of the estimates from previous and current iterations: 

a * = ^ r=1 rri yi Gl) 

Finally, the updated 2N x 2K channel estimate H is obtained by restructuring g* = [#*, <?2j ' ' ' ^XknV 
as follows: 

H-(p,q)=g*, p=l,2,--- ,2N, q = l,2,--- ,2K, (32) 

where n = 2N(p — 1) + q and H(p, q) denotes the element in the pth row and qth column of H. A 
complete listing of the proposed MCMC algorithm for channel estimation is given in the next page. 

The matrix H obtained thus is used for data detection using R-MCMC-R algorithm. This ends one 
iteration between channel estimation and detection. The detected data matrix is fed back for channel 
estimation in the next iteration, whose output is then used to detect the data matrix again. This iterative 
channel estimation and detection procedure is earned out for a certain number of iterations. 



E. Performance Results 

In Fig. [LlT a). we plot the mean square error performance (MSE) of the iterative channel estima- 
tion/detection scheme using proposed MCMC based channel estimation and R-MCMC-R based detection 
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Algorithm 2 Proposed MCMC algorithm for channel estimation 



1: input: r, , S, a 2 , g(°) : initial vector G R 4KN ; MAX: max. # iterations; 
2:r = l; 5 *(0)=g(°); af } = 0, Vi = 1, 2, • • • , 4KW; 
3: while r < MAX do 



4: g( r ) = g( r_1 ); 

5: r* = r-SgW; 

6: for i = 1 to AKN do 

7: Compute = r * + gj r) s"i, fW = [?«, 0] T , and s; = [s;, ct] t ; 

8: Compute /4- } = ^Si^ and a 2 g } r) = ^jif^p-; 



9: Sample gj> ~JV(a47. of/ 
10: r* = ?W - 
11: Compute a\ = exp I — v 2 ^ 2 {r ] ' I ; 

13: end for 

14: r = r + 1; 

15: end while 

16: output: g* = g*(MAX). g* : output solution vector 



with 4-QAM for K = N = 128 and Q = 9. In the simulations, the R-MCMC-R algorithm parameter 
values used are the same as in Sec. IIII-C4I For the MCMC channel estimation algorithm, the value 
of MAX used is 2. The MSEs of the initial channel estimate, and the channel estimates after 1 and 2 
iterations between channel estimation and detection are shown. For comparison, we also plot the Cramer- 
Rao lower bound (CRLB) for this system. It can be seen that in the proposed scheme results in good 
MSE performance with improved MSE for increased number of iterations between channel estimation 
and detection. For the same set of system and algorithm parameters in Fig. [HJa), we plot the BER 
performance curves in Fig. [147 b). For comparison, we also plot the BER performance with perfect 
channel knowledge. It can be seen that with two iterations between channel estimation and detection the 
proposed MCMC based algorithms can achieve 10~ 3 BER within about 1 dB of the performance with 
perfect channel knowledge. 

V. Equalization/Channel Estimation in Frequency Selective Fading 

In the previous sections, we considered frequency non-selective fading. In this section, we consider 
equalization and channel estimation in frequency selective fading. Multicarrier techniques like OFDM 
can transform a frequency selective channel into several narrow-band frequency non-selective channels. 
The MCMC based algorithms proposed in the previous sections can be employed for signal detection and 
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channel estimation on the resulting narrow-band channels. However, multicarrier systems face the peak- 
to-average power ratio (PAPR) problem. Single carrier (SC) block transmission schemes are considered 
as good alternatives to address the PAPR issue that arises in multicarrier systems IT391 - |45l . We consider 
cyclic prefixed single-carrier (CPSC) signaling, where the overall channel includes a Fourier transform 
(FFT) operation so that the transmitted symbols are estimated from the received frequency-domain signal 
l39l , HD1 . fiTll . In ll46l . the optimal training sequence that minimizes the channel estimation mean square 
error of the linear channel estimator is shown to be of length KL per transmit antenna. Blind/semi-blind 
channel estimation methods can be considered, but they require long data samples and the complexity is 
high B71 . ll48l . Here, we consider channel estimation using uplink pilots and iterations between channel 
estimation and equalization in multiuser MIMO CPSC systems. 

A. Multiuser MIMO CPSC system model 

Consider the uplink multiuser MIMO system shown in Fig. [TJ The channel between each pair of 
user transmit antenna and BS receive antenna is assumed to be frequency selective with L multipath 
components. Let h^' k \l) denote the channel gain between kth user and jth receive antenna at the BS 
on the /th path, which is modeled as CN(0,Qf). As in Sec. BU perfect synchronization among users' 
transmissions is assumed. Transmission is carried out in frames, where each frame consists of several 
blocks as shown in Fig. [15] As in IIV-AI the channel is assumed to be constant over one frame duration. 
Each frame consists of a pilot block for the purpose of initial channel estimation, followed by Q data 
blocks. The pilot block consists of (L — 1) + KL channel uses. In the first L — 1 channel uses in the 
pilot block, padding of L — 1 zeros is used to avoid inter-frame interference. In each of the remaining 
KL channel uses, a iT-length pilot symbol vector comprising of pilot symbols transmitted from K users 
(one pilot symbol per user) is received by N receive antennas at the BS. Each data block consists of 
I + L — l channel uses, where / number of K-length information symbol vectors (one data symbol from 
each user) preceded by (L — l)-length cyclic prefix from each user (to avoid inter-block interference) are 
transmitted. With Q data blocks in a frame, the number of channel uses in the data part of the frame is 
{I + L — l)Q. Taking both pilot and data channel uses into account, the total number of channel uses 
per frame is (L + l)K + (I + L - l)Q - 1. Data blocks are detected using the R-MCMC-R algorithm 
using an initial channel estimate. The detected data blocks are then iteratively used to refine the channel 
estimates during data phase. The padding of L — 1 zeros at the beginning of the pilot block makes the 
transmitters silent during the first L — l channel uses in a frame. The channel output in these channel 
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uses are ignored at the receiver. Accordingly, the Oth channel use in a frame at the receiver is taken to 
be the channel use in which the first pilot symbol in the frame is sent. 



B. Initial channel estimate during pilot phase 

Let b fc = [b k (0), b k (l), • • • , b k (KL — 1)] denote the pilot symbol vector transmitted from user k in 
KL channel uses in a frame. The signal received by the jth receive antenna at the BS during pilot phase 
in the nth channel use is given by 

K L-l 

V» = E E h^ k \l)b k {n -l) + 4(n), (33) 

k=l 1=0 

j = 1, 2 • • • , N, n = 0, 1, • • • , KL — 1, where the subscript P in yp(ra) and q^{n) denotes pilot phase. 
{(?p(n)} are noise samples modeled as i.i.d. CAf(0, a 2 ). We use the training sequence given by 

b fc = [°(fc-i)Lxi b 0(/f-(fc-i))L-i)xi]- (34) 
Writing (l33l) in matrix notation after substituting (l34l) . we get 

yi = Bph^' + q J p , j = l,2,---,N, (35) 

where 

yi = [^(0)^(1), • • • ,yl(KL - l)f, V = [(h^)f , • • • , (h^)f, • • • , (h^ff, 
h m = [^)( ),^ fc )(l),... ,h^ k \L- 1)} T , 4 = [qi(0),qi(l),--- , ql(K L - l)f , 
Bp = [B Pi Bp 2 • • • B Pl< ], B Pfc = [0 Lx ( fc _!) L UIl 0Lx(K-k)L] T ■ 



We use 6=^ K E s \ Y^=q £l 2 j to maintain the same average receive SNR in both pilot phase and data 
phase. 

From the signal observed at the jth receive antenna from time to KL — 1 during pilot phase, we 
obtain an initial estimate the channel vector b? using the scaled identity nature of B P , as 

y=yi/b, j = l,2,---,N. (36) 

These initial channel estimates are used to detect the data vectors in the data phase. 
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C. Equalization using initial channel estimates 

In the data phase, let & k = \a k (0), a k (1), • • • , a k (I + L — 2)] T denote the data vector of size (I + L — 
1) x 1, which includes (L — 1) cyclic prefix symbols and / information symbols transmitted from fcth 
user during 2th data block, i = 1, 2, • • • , Q. The signal received at jth receive antenna at nth channel 
use of ith data block is given by 



K L-l 
k=l 1=0 



(37) 



j = 1, 2 ■ ■ ■ , N, n = 0, 1, ■ ■ ■ , I + L — 2, where qj (n) is the noise sample modeled as i.i.d. CM(0, a 2 ). 



A 



Define the following vectors and matrices: y;- = [yj (L — 1), y\ (L), • • • ,y\{I + L — 2)] T , q| = [qj (L 



1), qj{L), ■■■ ,q>{I + L- 2)) T , = [a k (L - 1), a k (L), • • • , a k (I + L - 2)] T , and H^ fe as a (/ + L) x / 
circulant matrix with [h^' k \0), h^' k \l), ■ ■ ■ ,h& k \L- 1),0, 
definitions, d37l ) can be written in the form 



, 0] T as the first column. With these 



K 



yH£H^xf+qi, j 1.2. 



iV. 



fe=i 



We can write (l38l) as 



(38) 



(39) 



where y 4 = [(yi) T , (yf) T 



and 



[(xir,(x?f,...,(xffi T 



[(q 4 1 ) T ,(q?) T ,--,(qf) T ] T , 



H 



H 1 ' 1 H 1,2 ■ 
H 2 ' 1 H 2,2 ■ 



H 1 ^ 
H 2 * 



1) Equalization using R-MCMC-R method: The R-MCMC-R algorithm proposed in Sec. |III] is em- 
ployed in the frequency domain using FFT based processing for equalization. The circulant matrix H jf,fc 
can be decomposed as 

H i,fe = F ^ D i,fc F/; (40) 

where Fj is I x / DFT matrix, and D J ' fc is a diagonal matrix with its diagonal elements to be the DFT 
of the vector [/i^' fc )(0), h^' k \l), ■■■ , h^' k \L - 1), 0, • • • , 0] T . Taking the DFT of y{ in (gg), we get 



K 



1 5 ^ 



(41) 



fc=i 
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where zs?' = [^'(0), z{{\), ■ ■ ■ ,z}(I - 1)] T , b* = Fjx* = [6?(0), &?(!),-•• - 1)] T , and = 



F/q| = [w\ (0), (1), • • • ,wj(I — l)] T ■ Writing (|4TI ) in matrix form, we get 

z, = Dbi + Wi, i = l,2, ■■■ ,Q, 



(42) 



where z, = [(zjf, (zff, (zf ff, b, = [(b^f, (bff, ■ ■ • , (bf) T } T , w, = [(wjf, (wff, ■■■ , (wf f p, 
and 



D 



D 11 D 12 
D 21 D 22 



Y)N,1 J}N,2 . . . Y) N ' K 

Rearranging the terms, we can also write (l42l) as 

Zj = Dbj + Wj, % = 1,2, ••• ,Q, 

where 



z. 



(43) 



Zi(0) 




b.(0) 










Wi(0) 










'D(O) ••• 











, bj = 


bi(l) 


, D = 






Wi(l) 













D(7-l) _ 






_ Zi(/-1) _ 




_ bi(J-l) _ 








Wi(/-1) 


= [z}(m),zf 




^f(m)] T , b l (m) = [6i(m),6 2 (m), 







Wj(m) = [u^ (m) , w 2 (m) , • • • , u^(m)] T , and 



D(m) 



D 2 ' 1 ^) D 2 > 2 (m)--- B 2 > K (m) 



D N ^(m) B N ' 2 (m) ■ ■ ■ D N > K (m) 

D 3 '> fe (m) is the rath diagonal element of the matrix D J,fc . Also, bj = Fjq, where F = Fj ® Ik, 
Xi = [a}{L -!)■■■ af{L - l),a}(L) ■ ■ ■ af(L), ■ ■ ■ , aj(I + L - 2) • • • of (7 + L- 2)] T Now, we have 



z» = DFxj + Wj 

= Hxj + Wj, i = l,2, ••• ,Q, 



(44) 



where H = DF. 
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For each i in (l44l ). we run R-MCMC-R detection algorithm and detect the information symbols in the 
ith block. In the first iteration of data detection, we use the channel estimates from (l36l) to calculate H, 
an estimate of H. Each coordinate of the vector (z, — Hxj) has zero mean and 2a 2 variance. Using 
this knowledge, the statistics of the ML cost of error-free vectors are recalculated in the R-MCMC- 
R algorithm. R-MCMC-R detector outputs are denoted by xf, k = 1, ■ ■ ■ , K, i = 1, ■ ■ ■ ,Q. These 
output vectors are then used to improve the channel estimates through iterations between equalization 
and channel estimation. The channel estimation in these iterations is based on MCMC approach presented 
next. 

D. MCMC based channel estimation in data phase 
Consider (|4TT ). which can be rewritten as 

K 

z{ B i&" • w < • j = h2,---,N, (45) 

k=l 

where = diag(b k ), dP ,k is a vector consisting of the diagonal elements of matrix D J ' fe , which is the 
/-point DFT of i$ ,k ) (zero padded to length /), i.e., d^ k = Fj X Lh^' k \ where F/ X L is the matrix with 
the first L columns of Fj. Now, d45l ) can be written as 

K 

z^ = ^BfF, xL h^ fc )+w^. (46) 

k=l 

Defining = B k Fi X L, we can write d46b as 

4 = AiW+w{, i = l,-..,Q, (47) 
where Aj = [A] A? • • • Af ]. We can write (l47l ) as 

z J ' = Ah J +w J , (48) 

where 







Ai 




j 

Wj 


4 


, A = 


A 2 


, w J = 


j 

w 2 






. A « . 







Using the signal received at antenna j from blocks 1 to Q in a frame (i.e., using i?) and the matrix A 
which is formed by replacing the information symbols {x^"} in A by the detected information symbols 
{x^}, the channel coefficients {h J } are estimated using MCMC based estimation technique presented in 
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Sec- IIV-DTI This ends one iteration between channel estimation and detection. The detected data matrix is 
fed back for channel estimation in the next iteration, whose output is then used to detect the data matrix 
again. This iterative channel estimation and detection procedure is carried out for a certain number of 
iterations. 

E. Performance Results 

In Fig. [T6l we plot the BER performance of the iterative channel estimation/detection scheme using 
proposed MCMC based channel estimation and R-MCMC-R detection algorithms in uplink multiuser 
MIMO system on frequency selective fading with K = N = 16, L = 6, 1 = 64, Q = 9 and 4-QAM. 
For the same settings, we also plot the BER performance of the R-MCMC-R algorithm with perfect 
channel knowledge. BER improves with increasing number of iterations between channel estimation and 
detection. It can be observed that the proposed scheme with iterations is able to achieve performance 
which is close to the performance with perfect channel knowledge. 

VI. Conclusions 

We proposed novel MCMC based detection and channel estimation algorithms that achieved near- 
optimal performance on the uplink in large-scale multiuser MIMO systems. The proposed R-MCMC-R 
detection algorithm was shown to alleviate the stalling problem and achieve near-ML performance in 
large systems with tens to hundreds of antennas and higher-order QAM. Key ideas that enabled such 
attractive performance and complexity include i) a randomized sampling strategy that gave the algorithm 
opportunities to quickly exit from stalled solutions and move to better solutions, and ii) multiple random 
restarts that facilitated the algorithm to seek good solutions in different parts of the solution space. 
Multiple restarts alone (without randomized sampling) could not achieve near-ML performance at low 
complexity. Randomized sampling alone (without multiple restarts) could achieve near-ML performance 
at low complexity in the case of 4-QAM. But for higher-order QAM (16-/64-QAM) randomized sampling 
alone was not adequate. Joint use of both randomized sampling as well as multiple restarts was found 
to be crucial to achieve near-ML performance for 16-/64-QAM. We also proposed an MCMC based 
channel estimation algorithm which, in an iterative manner with the R-MCMC-R detection, achieved 
performance close to performance with perfect channel knowledge. We employed the proposed MCMC 
receiver architecture in the frequency domain for receiving CPSC signals on frequency selective fading 
between users and the BS. While simulations were used to establish the attractiveness of the algorithm in 
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performance and complexity, a theoretical analysis that could explain its good performance is important 
and challenging, which is a topic for future work. We have considered perfect synchronization and 
single-cell scenario in this paper. Other system level issues including uplink synchronization and multi- 
cell operation in large-scale MIMO systems can be considered as future work. 
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Fig. 1. Large-scale multiuser MIMO system on the uplink. 




Fig. 2. BER performance of the proposed R-MCMC algorithm in comparison with those of sphere decoder and MCMC 
algorithm with different values of a in uplink multiuser MIMO with K = N = 16, 4-QAM, and no power imbalance. 
Performance of R-MCMC is almost the same as the sphere decoder performance. 
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Uplink multiuser MIMO 
4-QAM,BER = 0.01 
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Number of users (K = N) 

Fig. 3. Complexity of the R-MCMC algorithm in average number of real operations per bit as a function of K = iV with 
4-QAM and no power imbalance at 10~ 2 BER. 




Fig. 4. BER performance of the R-MCMC algorithm in uplink multiuser MIMO with K = N = 8, 16, 32, 64, 128, 4-QAM 
and no power imbalance. 



31 




Number of users (K = N) 



Fig. 5. Average SNR required to achieve 10 3 BER as a function of number of users (K = N) in uplink multiuser MIMO 
with 4-QAM and no power imbalance. 




Fig. 6. BER performance of the R-MCMC algorithm in uplink multiuser MIMO with K — N = 16, 128 and 4-QAM. Power 
imbalance with ct^'s uniformly distributed between -3 dB and 3 dB. 
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Fig. 7. Comparison between R-MCMC performance and sphere decoder performance in uplink multiuser MIMO with K 
N = 16 and 4-/16-/64-QAM. 
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Fig. 8. Least ML cost up to nth iteration versus n in conventional MCMC and R-MCMC for different initial vectors in 
multiuser MIMO with K = N = 16. 
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Fig. 9. BER performance of R-MCMC as a function of number of restarts in multiuser MIMO with K = N = 16 and 
16-QAM at SNR = 18 dB. 
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Fig. 10. Histograms of standardized ML costs of correct and incorrect outputs from R-MCMC with restarts in multiuser MIMO 
with K = N = 8 and 4-/ 16-QAM. 
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Fig. 11. BER performance between conventional MCMC (without and with restarts), proposed R-MCMC (without and with 
restarts), and sphere decoder in uplink multiuser MIMO with K — N — 16 and 16-QAM. 
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Fig. 12. BER performance of R-MCMC-R algorithm in uplink multiuser MIMO with K = N = 16 and 32 and higher-order 
modulation (4-/16-/64-QAM). 
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Fig. 13. Frame structure for multiuser MIMO system in frequency non-selective fading. 
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Fig. 14. MSE and BER performance of iterative channel estimation/detection using MCMC based channel estimation and 
R-MCMC-R based detection in uplink multiuser MIMO systems on frequency non-selective fading with K — N — 128, Q — 9, 
4-QAM. 
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Fig. 15. Frame structure for multiuser MIMO CPSC system in frequency selective fading. 
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Fig. 16. BER performance of iterative channel estimation/detection using MCMC based channel estimation and R-MCMC-R 
detection in uplink multiuser MIMO systems on frequency selective fading with K = N = 16, L = 6,1 = 64, Q = 9, 4-QAM. 
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Modulation 


Algorithm 


Complexity in average number of real operations in x 10 6 
and SNR required to achieve 10~ 2 BER 


K = N = 16 


K = N = 32 


Complexity 


SNR 


Complexity 


SNR 


4-QAM 


R-MCMC-R (prop.) 


0.1424 


9 dB 


.848 


8.8 dB 


R3TS ED 


0.1877 


9 dB 


0.6823 


8.8 dB 


FSD in (3D 


0.1351 


10.1 dB 


4.9681 


10.3 dB 


16-QAM 


R-MCMC-R (prop.) 


1.7189 


17 dB 


15.158 


16.7 dB 


R3TS ED 


3.968 


17 dB 


7.40464 


17 dB 


FSD (38) 


4.836432 


17.6 dB 


4599.5311 


17.8 dB 


64-QAM 


R-MCMC-R (prop.) 


11.181 


24 dB 


166.284 


24 dB 


R3TS ED 


25.429504 


24.2 dB 


77.08784 


24.1 dB 


FSD in (38) 


305.7204 


24.3 dB 


* 


★ 



TABLE I 

Performance and complexity comparison of proposed R-MCMC-R detector with other detectors in iLTfl 
and 11381 for for K = N = 16, 32 and 4-/16-/64-QAM. * : Not simulated due to prohibitive complexity. 



